# README

Replication package for "Who Clears the Market When Passive Investors Trade?" by Marco Sammon and John Shim.

## Overview / Steps to Run Replication Code
This replication package contains all code needed to replicate tables and figures in the paper. To run the replication file, set the path variables in `run_analysis.do` to the location of the replication folder, then run `run_analysis.do`.  That file automatically runs all files to clean the data, run the analysis, and produce the tables and figures. Importantly, the files are run in order to generate all the needed intermediate files. Further, lines 9-21 of `run_analysis.do` install required user-defined packages in Stata.

We include dummy data (i.e., pseudo-data), which is a subset and randomized dataset of the raw data files we use. This is done to protect data we do not have the right to distribute. With the raw data files, the package takes about 1 hour to run based on the specifications below. With dummy data, it takes about 10 minutes. All code runs automatically and produces all tables and figures in the paper. Tables are reformatted manually using Excel. Specifically the file "Tables" in the output folder contains the formatted versions of the tables based on the raw output. The authors have verified that the code runs automatically from `run_analysis.do` with both the raw data and the raw dummy data (switching can be done by changing the global variable flag in the `run_analysis.do` file).

NOTE: the dummy data does not replicate the results in the paper, it is meant to demonstrate that the replication code works and how it works. Replacing the dummy version with the complete datasets from widely-accessible data providers will reproduce the results in the paper. The code is licensed under the MIT license. See LICENSE.txt for details.


## Data Availability and Prevenance
This paper uses widely available data sources. We certify that we have legitimate access to use the data in this paper. We do not have the right to distribute the data. As a result, we have included dummy versions of the data, in that the data is a subset of the actual raw data used and is then randomized. The actual raw data comes from publicly available sources, but require subscriptions to actually use the data. The dummy data are distributed under a Creative Commons Attribution 4.0(CC-BY 4.0). The replication package does not violate license agreements. The authors will preserve all actual raw data for a minimum of five years after publication.  


## Software Requirements and Run Times

1. Code was run on Stata MP 18.5. The code in `run_analysis.do` will install all required packages.

2. Approximate time to run (Using Stata MP 18.5 (16 Core), Mac M2 Ultra)
   With actual raw data (not included in public replication file), the code takes about 1 hour to complete. The size of the raw data is 53.63 GB, and the clean data and output generate occcupies 40 GB. 
   With dummy raw data (included in public replication file), the code takes about 10 minutes to complete. The size of the raw dummy data is 12.3 GB, and the clean data and output generate occcupies less than 5 GB.  

3. The code was tested both on Windows and Mac versions of Stata.  The code was last run on a Dell Tower ECT1250 desktop running Microsoft Windows 11 Home (Version 10.0.26200, Build 26200). The system is powered by an Intel Core Ultra 7 265 processor (20 cores, 20 logical processors, 2.40 GHz) and has 32 GB of installed RAM (31.5 GB usable). The machine is x64-based, running in UEFI BIOS mode with Secure Boot enabled. At the time of execution, approximately 13 GB of physical memory and 7 GB of virtual memory were available. 

4. There is no randomness in the actual generation of results (so no seeds need to be set). 


## Description of Raw Data Files Needed
All datasets are in stata format. All data are accessed from subscriptions via WRDS unless otherwise noted below. 

### CRSP Daily Stock Data
- `crsp_daily_1970.dta` - Daily stock return data from CRSP from 1970–1979
- `crsp_daily_1980.dta` - Daily stock return data from CRSP from 1980–1989
- `crsp_daily_1990.dta` - Daily stock return data from CRSP from 1990–1999
- `crsp_daily_2000.dta` - Daily stock return data from CRSP from 2000–2009
- `crsp_daily_2010.dta` - Daily stock return data from CRSP from 2010–2019
- `crsp_daily_2020.dta` - Daily stock return data from CRSP from 2020–2023
- `crsp_daily_index.dta` - Daily CRSP value-weighted market index returns (VWRETD)
- `crsp_daily_volume.dta` - Daily CRSP share trading volume data from 1970–2023

### CRSP Monthly Stock Data
- `crsp_monthly.dta` - Monthly CRSP stock returns data (includes delisting paydate observations)
- `crsp_monthly_nodelist.dta` - Monthly CRSP stock returns data (does not include delisting paydate observations)
- `crsp_monthly_index.dta` - Monthly CRSP value-weighted market index returns (VWRETD)
- `crsp_ticker_cusip_map.dta` - Historical mapping between CRSP permnos, tickers and CUSIPs over time
- `crsptrindex_2024.dta` - CRSP Total Market Index data
- `daily_map.dta` - Daily stock data identifiers

### CRSP Mutual Fund Data
- `crsp_fundno_wficn_link.dta` - Link table between CRSP mutual fund fundno and WFICN identifiers from MF links
- `crsp_mf_returns.dta` - CRSP monthly mutual fund returns and NAV data
- `crsp_mf_summary_annual.dta` - Annual summary characteristics for CRSP mutual funds
- `crsp_mf_summary_quarterly.dta` - Quarterly summary characteristics for CRSP mutual

### Thomson Reuters S12 Mutual Fund Holdings
- `thompson_s12_data.dta` - Raw Thomson s12 mutual fund holdings data (manager–CUSIP–position)
- `thompson_s12_summary.dta` - Aggregated manager-level summary statistics from Thomson s12 data
- `s12_fundno_wficn_link.dta` - Link table between Thomson s12 fundno identifiers and WFICN mutual fund IDs from MFlinks
- `vanguard_s12_fundno.dta` - Vanguard S12 fundno identifiers

### Thomson Reuters 13F Institutional Holdings
- `thompson_13f_holdings.dta` - Thomson Reuters 13F institutional holdings data
- `thompson_13f_summary.dta` - Thomson Reuters 13F Aggregated institution-level identifiers from Thomson 13F data
- `thompson_insiders_table1.dta` - Thomson Reuters Insider dataset

### Compustat Data
- `ccm_mapping.dta` - CRSP/Compustat (CCM) linking table
- `ccm_monthly_security.dta` - Linked data from CRSP/Compustat merged dataset
- `compustat_compensation.dta` - Compustat equity compensation data
- `compustat_short_interest.dta` - Compustat short interest data
- `ff_profit.dta` - Compustat profitability and accounting data
- `imported_compensation_data.dta` - RSU/PSU/stock option data from 10-K filings 
- `leverage_ratios.dta` - Compustat leverage ratio and accounting data
- `raw_buyback_data.dta` - Compustat share buyback data
- `wrds_ratio_book_to_market.dta` - Book-to-market ratios computed from WRDS financial ratios suite

### Index Data
- `active_share_major_benchmarks.dta` - S12 funds and associated Active Share (Cremers and Petajisto) benchmarks (S12 data combined with Active Share benchmark data provided by Martijn Cremers and Tim Riley)
- `russell_1000_2000.dta` - Russell 1000/2000 index constituent data (provided by FTSE Russell)
- `russ_monthly.dta` - All Russell index constituent data (provided by FTSE Russell)
- `sp_1500.dta` - S&P 1500 index constituent data (provided by Standard & Poor's)

### Asset Pricing and Factor Data
- `ff3daily.dta` - Daily Fama–French 3-factor returns (MKT, SMB, HML) (retrieved from Ken French's website)
- `capm_betas.dta` - CAPM betas from WRDS beta suite
- `fred_cpi.dta` - FRED U.S. CPI series (retrieved from FRED)

### Other Data
- `raw_bushee_data.dta` - Brian Bushee 13F institutional investor classification data (13F data combined with data retrieved from Brian Bushee's website)
- `seo_imported.dta` - Seasoned equity offering (SEO) data (retrieved from Bloomberg)


# Full Dataset and Table/Figure List

## Dataset list

| Data file | Source | Notes | Provided |
| --- | --- | --- | --- |
| `crsp_daily_1970.dta` | CRSP | Daily stock returns 1970–1979 | Yes (dummy) |
| `crsp_daily_1980.dta` | CRSP | Daily stock returns 1980–1989 | Yes (dummy) |
| `crsp_daily_1990.dta` | CRSP | Daily stock returns 1990–1999 | Yes (dummy) |
| `crsp_daily_2000.dta` | CRSP | Daily stock returns 2000–2009 | Yes (dummy) |
| `crsp_daily_2010.dta` | CRSP | Daily stock returns 2010–2019 | Yes (dummy) |
| `crsp_daily_2020.dta` | CRSP | Daily stock returns 2020–2023 | Yes (dummy) |
| `crsp_daily_index.dta` | CRSP | Daily VWRETD index returns | Yes (dummy) |
| `crsp_daily_volume.dta` | CRSP | Daily share volume 1970–2023 | Yes (dummy) |
| `crsp_monthly.dta` | CRSP | Monthly stock returns (with delisting) | Yes (dummy) |
| `crsp_monthly_nodelist.dta` | CRSP | Monthly stock returns (no delisting) | Yes (dummy) |
| `crsp_monthly_index.dta` | CRSP | Monthly VWRETD index returns | Yes (dummy) |
| `crsp_ticker_cusip_map.dta` | CRSP | Permno–ticker–CUSIP mapping | Yes (dummy) |
| `crsptrindex_2024.dta` | CRSP | CRSP total return index  | Yes (dummy) |
| `daily_map.dta` | CRSP | CRSP Total Market Index data | Yes (dummy) |
| `crsp_fundno_wficn_link.dta` | CRSP/MFlinks | Fundno to WFICN link table | Yes (dummy) |
| `crsp_mf_returns.dta` | CRSP | Monthly mutual fund returns and NAV | Yes (dummy) |
| `crsp_mf_summary_annual.dta` | CRSP | Annual mutual fund characteristics | Yes (dummy) |
| `crsp_mf_summary_quarterly.dta` | CRSP | Quarterly mutual fund characteristics | Yes (dummy) |
| `thompson_s12_data.dta` | Thomson Reuters | Raw s12 fund holdings | Yes (dummy) |
| `thompson_s12_summary.dta` | Thomson Reuters | Manager-level s12 summary stats | Yes (dummy) |
| `s12_fundno_wficn_link.dta` | Thomson/MFlinks | s12 fundno to WFICN link table | Yes (dummy) |
| `vanguard_s12_fundno.dta` | Thomson Reuters | Vanguard S12 fund number mapping | Yes (dummy) |
| `thompson_13f_holdings.dta` | Thomson Reuters | 13F institutional holdings | Yes (dummy) |
| `thompson_13f_summary.dta` | Thomson Reuters | Aggregated 13F summary data | Yes (dummy) |
| `thompson_insiders_table1.dta` | Thomson Reuters | Insider transactions data | Yes (dummy) |
| `ccm_mapping.dta` | CRSP/Compustat | CCM linking table | Yes (dummy) |
| `ccm_monthly_security.dta` | CRSP/Compustat | CCM linked monthly stock data | Yes (dummy) |
| `compustat_compensation.dta` | Compustat | Equity compensation data | Yes (dummy) |
| `compustat_short_interest.dta` | Compustat | Short interest data | Yes (dummy) |
| `ff_profit.dta` | Compustat | Profitablity and accounting data | Yes (dummy) |
| `imported_compensation_data.dta` | Compustat/WRDS | Conditional compensation data from 10-Ks | Yes (dummy) |
| `leverage_ratios.dta` | WRDS/Compustat | Leverage ratios and accounting data | Yes (dummy) |
| `raw_buyback_data.dta` | Compustat | Share repurchase data | Yes (dummy) |
| `wrds_ratio_book_to_market.dta` | WRDS/Compustat | Book-to-market ratios | Yes (dummy) |
| `active_share_major_benchmarks.dta` | S12 and Martijn Cremers/Tim Riley | Active Share and benchmarks assigned to funds | Yes (dummy) |
| `russell_1000_2000.dta` | Russell | Russell 1000/2000 constituent data | Yes (dummy) |
| `russ_monthly.dta` | Russell | Russell all index constituent data | Yes (dummy) |
| `sp_1500.dta` | S&P | S&P index constituent data | Yes (dummy) |
| `ff3daily.dta` | Ken French's Website | Daily 3-factor returns | Yes (dummy) |
| `capm_betas.dta` | WRDS | CAPM beta estimates | Yes (dummy) |
| `fred_cpi.dta` | FRED | CPI series | Yes (dummy) |
| `raw_bushee_data.dta` | Brian Bushee | Institutional investor classifications | Yes (dummy) |
| `seo_imported.dta` | Bloomberg | Seasoned equity offering data | Yes (dummy) |


## List of Tables and Programs

| Figure/Table # | Program | Output file | Note |
| --- | --- | --- | --- |
| Figure 1 | 1_bin_scatter.do | figure1_binscatter_ew.png | Binscatter of investor group demand on index fund demand |
| Figure 2a | 4a_aum_by_type.do | figure2a_total_holdings.png | Total holdings of broad-based and style funds over time |
| Figure 2b | 4c_compare_turnover.do | figure2b_turnover.png | Turnover of broad-based vs. style funds |
| Figure 2c | 4c_compare_turnover.do | figure2c_num_holdings.png | Number of holdings for broad-based vs. style funds |
| Figure 2d | 4b_compare_weights_mkt_weights.do | figure2d_weight_diffs.png | Portfolio weights for broad-based vs. style funds |
| Figure 3 | 8_compensation_magnitudes.do | figure3_binscatter_stock_comp.png | Residual firm group vs. share compensation |
| Figure 4 | 9_rsu_psu_magnitudes.do | figure4_conditional_issuance.png | Residual firm group and conditional compensation |
| Table 1 | 2_ols_regression.do | table1_marketclearing_regs.dta | OLS market-clearing regressions  |
| Table 2 | 2_ols_regression.do | table2_marketclearing_regs_pos_neg.dta | OLS regressions by pos/neg index fund demand |
| Table 3 | 3_horizon.do | table3_marketclearing_regs_horizon.dta | OLS regressions by horizon |
| Table 4 | 4d_var_decomp.do | table4_var_decomp.csv | Variance decomposition of index fund demand by source |
| Table 5 | 5_iv_baseline.do | table5_iv.txt | Baseline IV regressions |
| Table 6 | 5b_iv_bygroup.do | table6_iv_by_group.txt | IV regressions by group and flow direction |
| Table 7 | 6_ols_regression_heterogenity.do | table7_tobins_q.txt | OLS regressions by Tobin's Q quintile |
| Table 8 | 7_decomposition_seo_bb_comp.do | table8_firm_decomp.dta | OLS regressions with firm decomposition |

# A full codebook that describes each variable and table is provided in sammon_shim_codebook.xlsx.

Variable labels are included in most of the pseudo-data.dta files themselves. In the codebook, note that we provide descriptions only for the variables used in our empirical analysis. The underlying WRDS datasets contain many additional fields that are not documented here for brevity. 
# Data Sources and Citations

All datasets are accessed via institutional subscriptions through WRDS
unless otherwise noted below.

------------------------------------------------------------------------

## CRSP (Center for Research in Security Prices)

**Provider:** Wharton Research Data Services (WRDS), University of
Chicago Booth School of Business

**Datasets Used:** - CRSP Daily Stock Files (1970--2023) - CRSP Daily
Index File - CRSP Monthly Stock Files - CRSP Monthly Index File - CRSP Total Market Index - CRSP Ticker--CUSIP Mapping File -
CRSP Mutual Fund Database - CRSP/MFLinks linking tables - CRSP/Compustat Merged (CCM) Linking Table

**Suggested Citation:** \> Center for Research in Security Prices
(CRSP). CRSP US Stock Database and CRSP US Mutual Fund Database.
University of Chicago Booth School of Business. Accessed via WRDS.

------------------------------------------------------------------------

## Compustat

**Provider:** S&P Global Market Intelligence (via WRDS)

**Datasets Used:** - Compustat Fundamentals Annual and Quarterly -
Compustat Short Interest

**Suggested Citation:** \> S&P Global Market Intelligence. Compustat
North America Fundamentals. Accessed via Wharton Research Data Services
(WRDS).

------------------------------------------------------------------------

## Thomson Reuters Institutional Holdings (13F and S12)

**Provider:** Refinitiv (formerly Thomson Reuters), accessed via WRDS

**Datasets Used:** - Thomson Reuters 13F Institutional Holdings -
Thomson Reuters Mutual Fund (S12) Holdings - Insider Transactions
dataset - MFLinks mapping tables (provided by WRDS)

**Suggested Citation:** \> Refinitiv (formerly Thomson Reuters).
Institutional (13F) Holdings and Mutual Fund (S12) Holdings Databases.
Accessed via WRDS.

------------------------------------------------------------------------

## Index Constituents and Benchmark Data

**Providers:** - FTSE Russell - Standard & Poor's (S&P)

**Datasets Used:** - Russell 1000 / 2000 / 2500 constituent files -
Russell Monthly Index Constituents - S&P 1500 constituent data

**Suggested Citation:** \> FTSE Russell. Russell Indexes.\
\> Standard & Poor's. S&P Index Constituents.

------------------------------------------------------------------------

## Fama--French Factors

**Provider:** Kenneth R. French Data Library

**Datasets Used:** - Daily Fama--French Factors

**Suggested Citation:** \> Fama, Eugene F., and Kenneth R. French.
"Common Risk Factors in the Returns on Stocks and Bonds." *Journal of
Financial Economics* 33 (1993): 3--56.\
\> Data retrieved from the Kenneth R. French Data Library.

------------------------------------------------------------------------

## CAPM Betas

**Provider:** WRDS Beta Suite

**Suggested Citation:** \> WRDS Beta Suite. Wharton Research Data
Services.

------------------------------------------------------------------------

## WRDS Financial Ratios Suite

**Provider:** WRDS

**Datasets Used:** - WRDS Financial Ratios

**Suggested Citation:** \> WRDS Financial Ratios suite. Accessed via Wharton Research Data Services
(WRDS).

------------------------------------------------------------------------

## CPI (Inflation Data)

**Provider:** Federal Reserve Bank of St.Louis (FRED)

**Dataset Used:** - U.S. Consumer Price Index (CPI)

**Suggested Citation:** \> Federal Reserve Bank of St. Louis. FRED
Economic Data. Consumer Price Index Series.

------------------------------------------------------------------------

## Bushee Institutional Classification Data

**Provider:** Brian Bushee (University of Pennsylvania)

**Dataset Used:** - Institutional investor classifications merged with
13F data

**Suggested Citation:** \> Bushee, Brian J. Institutional Investor
Classification Data.

------------------------------------------------------------------------

## Seasoned Equity Offering (SEO) Data

**Provider:** Bloomberg

**Dataset Used:** - Seasoned equity offering transactions

**Suggested Citation:** \> Bloomberg L.P. Equity Offering Data.


# Acknowledgements

No templates were used in the preparation of this readme file. 